AITopics | compression error

Collaborating Authors

compression error

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Rethinking gradient sparsification as total error minimization

Neural Information Processing SystemsApr-25-2026, 15:45:54 GMT

Gradient compression is a widely-established remedy to tackle the communication bottleneck in distributed training of large deep neural networks (DNNs). Under the error-feedback framework, Top-k sparsification, sometimes with k as little as 0.1% of the gradient size, enables training to the same model quality as the uncompressed case for a similar iteration count. From the optimization perspective, we find that Top-k is the communication-optimal sparsifier given a per-iteration k element budget. We argue that to further the benefits of gradient sparsification, especially for DNNs, a different perspective is necessary -- one that moves from per-iteration optimality to consider optimality for the entire training. We identify that the total error -- the sum of the compression errors for all iterations -- encapsulates sparsification throughout training.

artificial intelligence, compressor, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

f5c3dd7514bf620a1b85450d2ae374b1-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 22:48:56 GMT

generalization, neural network, pruning, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > Switzerland (0.04)

Genre: Research Report > New Finding (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

b0ab42fcb7133122b38521d13da7120b-Supplemental.pdf

Neural Information Processing SystemsFeb-10-2026, 17:59:10 GMT

co 0, compression, gradient, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.14)
South America > Brazil > São Paulo (0.04)
North America > United States > Oregon (0.04)
(4 more...)

Industry: Information Technology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

77f2d0c271e508278ea13e24cd8773d5-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 23:13:53 GMT

max 1, neolithic, prog, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

6fb9ea5197c0b8ece8a64220fb82cdfe-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 17:15:09 GMT

algorithm, compressor, ef-bv, (15 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Saudi Arabia (0.04)

Genre: Research Report (0.46)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Data Science (0.93)

Add feedback

Escaping Saddle Points with Compressed SGD

Neural Information Processing SystemsFeb-8-2026, 17:46:32 GMT

Stochastic Gradient Descent (SGD) and its variants are the main workhorses of modern machine learning. Distributed implementations of SGD on a cluster of machines with a central server and a large number of workers are frequently used in practice due to the massive size of the data. In distributed SGD each machine holds a copy of the model and the computation proceeds in rounds. In every round, each worker finds a stochastic gradient based on its batch of examples, the server averages these stochastic gradients to obtain the gradient of the entire batch, makes an SGD step, and broadcasts the updated model parameters to the workers. With a large number of workers, computation parallelizes efficiently while communication becomes the main bottleneck [Chilimbi et al., 2014, Strom, 2015], since each worker needsto send its gradients to the server and receive the updatedmodel parameters. Commonsolutions for this probleminclude: local SGDand its variants, when each machine performs multiple local steps before communication [Stich, 2018]; decentralized architectureswhich allow pairwisecommunicationbetween the workers [McMahanet al., 2017] and gradient compression, when a compressed version of the gradient is communicated instead of the full gradient [Bernstein et al., 2018, Stich et al., 2018, Karimireddy et al., 2019]. In this work, we consider the latter approach, which we refer to as compressed SGD. Most machine learning models can be described by a d-dimensional vector of parameters x and themodel quality canbe estimatedas a function f(x).

artificial intelligence, iteration, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.75)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

Rethinking gradient sparsification as total error minimization

Neural Information Processing SystemsFeb-8-2026, 10:07:40 GMT

We identify that the total error -- the sum of the compression errors for all iterations -- encapsulates sparsification throughout training.

artificial intelligence, compressor, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Kansas (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

ErrorCompensatedX: error compensation for variance reduced algorithms

Neural Information Processing SystemsDec-24-2025, 13:05:35 GMT

Communication cost is one major bottleneck for the scalability for distributed learning. One approach to reduce the communication cost is to compress the gradient during communication. However, directly compressing the gradient decelerates the convergence speed, and the resulting algorithm may diverge for biased compression. Recent work addressed this problem for stochastic gradient descent by adding back the compression error from the previous step. This idea was further extended to one class of variance reduced algorithms, where the variance of the stochastic gradient is reduced by taking a moving average over all history gradients. However, our analysis shows that just adding the previous step's compression error, as done in existing work, does not fully compensate the compression error. So, we propose ErrorCompensateX, which uses the compression error from the previous two steps. We show that ErrorCompensateX can achieve the same asymptotic convergence rate with the training without compression. Moreover, we provide a unified theoretical analysis framework for this class of variance reduced algorithms, with or without error compensation.

algorithm, error compensation, variance, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.84)

Add feedback